Hierarchical link clustering

A Link community detection method.

Time complexity

It involves the following steps:

  1. Similarity calculation
  2. Hierarchical clustering

Similarities between edges are calculated by visiting each node () and measuring the Jaccard coefficient of each edge pair. The time complexity of each operation (Jaccard) is , where and are the degrees of the two nodes that are connected to . The worst case is a hub that is connected to everyone: . If the degree distribution is bounded or if we ignore the pairs with at least one node with more than a certain degree, then the complexity is .

Then, we need of this operation for each node . There are nodes that have a degree . For every node, the number of operations is Without hubs (e.g. Poisson distribution), the time complexity is as converges to a value that does not scale as . If , the sum scales logarithmically, and thus the time complexity becomes . For a general case with (), using (see this), we can calculate the sum scales as . The time complexity is therefore .

The whole process of the similarity calculation requires

Once we have the similarities of edge pairs, we can run the full hierarchical clustering or sample several thresholds to identify the optimal threshold. If we do the full clustering, we would need . If we use the sampling approach, (need more analysis)…

So the total time complexity is and it’ll depend on the types of networks. For instance, if the network is a scale-free network with exponent 3, if we set a certain degree threshold for calculating the similarity, and if the number of edges is or smaller, then the total time complexity will be .